Truncated Normal distribution (truncnorm)#

A truncated normal is what you get when a normal random variable is restricted to an interval. Formally, it’s a normal distribution conditioned on lying between a lower bound and an upper bound.

It’s a natural model whenever the untruncated normal story is reasonable (additive noise, central-limit behavior), but the variable is physically or logically constrained.


1) Title & classification#

Item

Value

Name

Truncated normal (truncnorm)

Type

Continuous

Support

\(x \in [\ell, u]\) with \(\ell < u\) (bounds may be \(\pm\infty\))

Parameters

location \(\mu\in\mathbb{R}\), scale \(\sigma>0\), bounds \(\ell<u\)

SciPy shape params

\(a = (\ell-\mu)/\sigma\), \(b=(u-\mu)/\sigma\) with \(a<b\)

A compact way to write the model is:

\[ X \sim \mathcal{N}(\mu,\sigma^2)\;\big|\; \ell \le X \le u. \]

What you’ll be able to do after this notebook#

  • write the PDF/CDF of a truncated normal and map parameters to SciPy

  • compute mean/variance/skewness/kurtosis (and understand why the mean shifts)

  • interpret how \((\mu,\sigma,\ell,u)\) change the shape

  • derive the log-likelihood and fit \((\mu,\sigma)\) by MLE when truncation is known

  • sample from a truncated normal using NumPy only (rejection sampling)

import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

from scipy import optimize, special, stats

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)

rng = np.random.default_rng(42)

2) Intuition & motivation#

What it models#

A truncation happens when values outside a range are impossible to observe (or are removed from the population). For example, imagine an underlying measurement model:

\[Y = \mu + \sigma Z,\quad Z\sim\mathcal{N}(0,1),\]

but the process can only produce values in \([\ell,u]\). Then the observed variable is the conditional distribution:

\[X = Y\mid (\ell\le Y \le u).\]

This is not the same as clipping (censoring) where out-of-range values are recorded at the boundary.

Typical real-world use cases#

  • Physical constraints: lengths, concentrations, or prices that cannot go negative (often \(\ell=0\)).

  • Quality control / selection bias: only items within spec are kept; others are discarded.

  • Psychometrics / surveys: scores restricted to a finite scale (e.g. 0–100).

  • Truncated regression: outcomes are observed only when they fall within a range (distinct from Tobit censoring).

  • Bayesian priors with constraints: a normal prior truncated to enforce bounds.

Relations to other distributions#

  • As \(\ell\to-\infty\) and \(u\to\infty\), the truncated normal becomes an ordinary normal.

  • One-sided truncation with \((\ell=0,\;u=\infty)\) yields a half-normal when \(\mu=0\) (up to reparameterization).

  • Truncation is a generic operation: many “bounded” distributions are best understood as conditional versions of simpler base models.

3) Formal definition#

Let \(\varphi\) and \(\Phi\) denote the standard normal PDF and CDF:

\[ \varphi(z)=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{z^2}{2}\right), \qquad \Phi(z)=\int_{-\infty}^{z}\varphi(t)\,dt. \]

Define standardized truncation points

\[ \alpha = \frac{\ell-\mu}{\sigma}, \qquad \beta = \frac{u-\mu}{\sigma}, \qquad Z = \Phi(\beta)-\Phi(\alpha). \]

Here \(Z\) is the normalization constant (the probability that an untruncated normal lands in \([\ell,u]\)).

PDF#

For \(x\in[\ell,u]\):

\[ f(x\mid\mu,\sigma,\ell,u) = \frac{1}{\sigma\,Z}\;\varphi\!\left(\frac{x-\mu}{\sigma}\right), \qquad f(x)=0\;\text{outside }[\ell,u]. \]

CDF#

\[\begin{split} F(x)= \begin{cases} 0,& x<\ell,\\ \dfrac{\Phi\bigl((x-\mu)/\sigma\bigr)-\Phi(\alpha)}{Z},& \ell\le x\le u,\\ 1,& x>u. \end{cases} \end{split}\]

SciPy parameterization#

scipy.stats.truncnorm uses standardized bounds \((a,b)\) plus loc and scale:

\[ (a,b,\texttt{loc},\texttt{scale}) =\Bigl(\alpha,\beta,\mu,\sigma\Bigr). \]

So for known bounds \([\ell,u]\) you typically compute:

from scipy import stats

a = (lower - mu) / sigma
b = (upper - mu) / sigma
rv = stats.truncnorm(a, b, loc=mu, scale=sigma)
LOG_SQRT_2PI = 0.5 * np.log(2 * np.pi)


def norm_logpdf(z):
    z = np.asarray(z, dtype=float)
    return -0.5 * z**2 - LOG_SQRT_2PI


def norm_pdf(z):
    return np.exp(norm_logpdf(z))


def norm_cdf(z):
    # SciPy's ndtr is a fast and numerically stable normal CDF implementation
    return special.ndtr(z)


def norm_logcdf(z):
    # log Phi(z), stable for large negative z
    return special.log_ndtr(z)


def _logdiffexp(log_a, log_b):
    """Compute log(exp(log_a) - exp(log_b)) for log_a >= log_b."""
    return log_a + np.log1p(-np.exp(log_b - log_a))


def truncnorm_standardized_bounds(mu, sigma, lower, upper):
    mu = float(mu)
    sigma = float(sigma)
    lower = float(lower)
    upper = float(upper)

    if sigma <= 0:
        raise ValueError("sigma must be > 0")
    if not (lower < upper):
        raise ValueError("require lower < upper")

    alpha = (lower - mu) / sigma
    beta = (upper - mu) / sigma
    return alpha, beta


def truncnorm_logZ(alpha, beta):
    """Compute log(Z) where Z = Phi(beta) - Phi(alpha)."""
    alpha = np.asarray(alpha, dtype=float)
    beta = np.asarray(beta, dtype=float)
    if np.any(beta <= alpha):
        raise ValueError("require beta > alpha")

    # Direct CDF difference works well in the lower tail / central region.
    logPhi_a = norm_logcdf(alpha)
    logPhi_b = norm_logcdf(beta)
    logZ_cdf = _logdiffexp(logPhi_b, logPhi_a)

    # In the upper tail, it's often more stable to use survival functions:
    # Phi(beta) - Phi(alpha) = sf(alpha) - sf(beta) = Phi(-alpha) - Phi(-beta).
    logSf_a = norm_logcdf(-alpha)
    logSf_b = norm_logcdf(-beta)
    logZ_sf = _logdiffexp(logSf_a, logSf_b)

    use_sf = alpha > 0
    return np.where(use_sf, logZ_sf, logZ_cdf)


def truncnorm_logpdf(x, mu, sigma, lower, upper):
    """Log-PDF of TruncNorm(mu, sigma^2; [lower, upper])."""
    x = np.asarray(x, dtype=float)
    alpha, beta = truncnorm_standardized_bounds(mu, sigma, lower, upper)

    z = (x - mu) / sigma
    logZ = truncnorm_logZ(alpha, beta)

    logpdf = norm_logpdf(z) - np.log(sigma) - logZ
    return np.where((x >= lower) & (x <= upper), logpdf, -np.inf)


def truncnorm_pdf(x, mu, sigma, lower, upper):
    return np.exp(truncnorm_logpdf(x, mu, sigma, lower, upper))


def truncnorm_cdf(x, mu, sigma, lower, upper):
    """CDF of TruncNorm(mu, sigma^2; [lower, upper])."""
    x = np.asarray(x, dtype=float)
    alpha, beta = truncnorm_standardized_bounds(mu, sigma, lower, upper)

    z = (x - mu) / sigma
    Z = norm_cdf(beta) - norm_cdf(alpha)

    cdf = (norm_cdf(z) - norm_cdf(alpha)) / Z
    cdf = np.where(x < lower, 0.0, cdf)
    cdf = np.where(x > upper, 1.0, cdf)
    return np.clip(cdf, 0.0, 1.0)


def truncnorm_moments(mu, sigma, lower, upper):
    """Mean/variance/skewness/excess kurtosis (using standardized raw moments)."""
    alpha, beta = truncnorm_standardized_bounds(mu, sigma, lower, upper)

    Z = norm_cdf(beta) - norm_cdf(alpha)
    phi_a = norm_pdf(alpha)
    phi_b = norm_pdf(beta)

    # Standardized raw moments for T ~ N(0,1) | alpha <= T <= beta
    m1 = (phi_a - phi_b) / Z
    m2 = 1.0 + (alpha * phi_a - beta * phi_b) / Z
    m3 = ((alpha**2 + 2.0) * phi_a - (beta**2 + 2.0) * phi_b) / Z
    m4 = 3.0 + ((alpha**3 + 3.0 * alpha) * phi_a - (beta**3 + 3.0 * beta) * phi_b) / Z

    var_t = m2 - m1**2

    # Central moments (standardized)
    mu3 = m3 - 3 * m1 * m2 + 2 * m1**3
    mu4 = m4 - 4 * m1 * m3 + 6 * (m1**2) * m2 - 3 * m1**4

    if var_t <= 0:
        skew = np.nan
        excess_kurt = np.nan
    else:
        skew = mu3 / (var_t ** 1.5)
        excess_kurt = mu4 / (var_t**2) - 3.0

    mean = mu + sigma * m1
    var = (sigma**2) * var_t

    return {
        'alpha': alpha,
        'beta': beta,
        'Z': Z,
        'mean': mean,
        'variance': var,
        'skewness': skew,
        'excess_kurtosis': excess_kurt,
        'm1': m1,
        'm2': m2,
        'm3': m3,
        'm4': m4,
    }


def truncnorm_entropy(mu, sigma, lower, upper):
    """Differential entropy (natural log)."""
    alpha, beta = truncnorm_standardized_bounds(mu, sigma, lower, upper)

    Z = norm_cdf(beta) - norm_cdf(alpha)
    phi_a = norm_pdf(alpha)
    phi_b = norm_pdf(beta)

    return (
        0.5 * np.log(2 * np.pi * np.e * sigma**2)
        + np.log(Z)
        + (alpha * phi_a - beta * phi_b) / (2 * Z)
    )


def truncnorm_mgf(t, mu, sigma, lower, upper):
    """MGF M_X(t) for real t (exists for all real t since support is bounded)."""
    t = np.asarray(t, dtype=float)
    alpha, beta = truncnorm_standardized_bounds(mu, sigma, lower, upper)
    Z = norm_cdf(beta) - norm_cdf(alpha)

    num = norm_cdf(beta - sigma * t) - norm_cdf(alpha - sigma * t)
    return np.exp(mu * t + 0.5 * (sigma * t) ** 2) * (num / Z)


def truncnorm_loglik(x, mu, sigma, lower, upper):
    x = np.asarray(x, dtype=float)
    return np.sum(truncnorm_logpdf(x, mu, sigma, lower, upper))

4) Moments & properties#

Let \(\alpha=(\ell-\mu)/\sigma\), \(\beta=(u-\mu)/\sigma\), and \(Z=\Phi(\beta)-\Phi(\alpha)\). Define the standardized truncated variable:

\[ T \sim \mathcal{N}(0,1)\mid (\alpha\le T\le \beta), \qquad X = \mu + \sigma T. \]

A convenient summary is:

Quantity

Expression

Mean

\(\mathbb{E}[X]=\mu+\sigma\,\dfrac{\varphi(\alpha)-\varphi(\beta)}{Z}\)

Variance

\(\mathrm{Var}(X)=\sigma^2\Bigl(1+\dfrac{\alpha\varphi(\alpha)-\beta\varphi(\beta)}{Z}-\bigl(\dfrac{\varphi(\alpha)-\varphi(\beta)}{Z}\bigr)^2\Bigr)\)

Skewness

computed from standardized central moments of \(T\) (see below)

Excess kurtosis

computed from standardized central moments of \(T\)

MGF

\(M_X(t)=e^{\mu t+\tfrac12\sigma^2 t^2}\,\dfrac{\Phi(\beta-\sigma t)-\Phi(\alpha-\sigma t)}{Z}\)

CF

\(\varphi_X(\omega)=M_X(i\omega)\) (complex extension of \(\Phi\))

Entropy

\(h(X)=\tfrac12\log(2\pi e\sigma^2)+\log Z+\dfrac{\alpha\varphi(\alpha)-\beta\varphi(\beta)}{2Z}\)

Raw-moment formulas (standardized)#

For \(T\) as above:

(28)#\[\begin{align} \mathbb{E}[T] &= \frac{\varphi(\alpha)-\varphi(\beta)}{Z}\\ \mathbb{E}[T^2] &= 1 + \frac{\alpha\varphi(\alpha)-\beta\varphi(\beta)}{Z}\\ \mathbb{E}[T^3] &= \frac{(\alpha^2+2)\varphi(\alpha)-(\beta^2+2)\varphi(\beta)}{Z}\\ \mathbb{E}[T^4] &= 3 + \frac{(\alpha^3+3\alpha)\varphi(\alpha)-(\beta^3+3\beta)\varphi(\beta)}{Z} \end{align}\]

From these, compute central moments and then skewness/kurtosis in the usual way.

mu, sigma, lower, upper = 0.5, 1.2, -1.0, 2.0

mom = truncnorm_moments(mu, sigma, lower, upper)
{
    'mean': mom['mean'],
    'variance': mom['variance'],
    'skewness': mom['skewness'],
    'excess_kurtosis': mom['excess_kurtosis'],
    'entropy': truncnorm_entropy(mu, sigma, lower, upper),
    'mgf(t=0.5)': truncnorm_mgf(0.5, mu, sigma, lower, upper),
}
{'mean': 0.5,
 'variance': 0.6063036262023139,
 'skewness': 0.0,
 'excess_kurtosis': -0.9776753030120942,
 'entropy': 1.0744134977913906,
 'mgf(t=0.5)': 1.3838567001159119}

5) Parameter interpretation#

Think in two layers:

  1. Underlying normal: \(Y\sim\mathcal{N}(\mu,\sigma^2)\)

  2. Conditioning (truncation): keep only values in \([\ell,u]\)

How parameters affect shape:

  • \(\mu\) (location) shifts the underlying normal; after truncation, the actual mean is pulled toward the interval.

  • \(\sigma\) (scale) controls spread; if \(\sigma\) is large relative to \((u-\ell)\), most mass is chopped off and the shape can become highly skewed.

  • Bounds \((\ell,u)\) cut tails; narrow intervals force a near-uniform-looking bump (but still “normal-shaped” on the log scale).

In standardized coordinates, \((a,b)=((\ell-\mu)/\sigma,(u-\mu)/\sigma)\) are the bounds measured in standard deviations from \(\mu\).

# How truncation changes shape

def plot_pdf_family(param_sets, title):
    fig = go.Figure()

    for ps in param_sets:
        mu, sigma, lower, upper, name = ps
        x = np.linspace(
            lower if np.isfinite(lower) else mu - 4 * sigma,
            upper if np.isfinite(upper) else mu + 4 * sigma,
            600,
        )
        y = truncnorm_pdf(x, mu, sigma, lower, upper)
        fig.add_trace(go.Scatter(x=x, y=y, mode='lines', name=name))

    fig.update_layout(title=title, xaxis_title='x', yaxis_title='pdf')
    return fig

param_sets_1 = [
    (0.0, 1.0, -2.0, 2.0, "mu=0, sigma=1, [-2,2]"),
    (0.0, 1.0, 0.0, np.inf, "mu=0, sigma=1, [0,∞)"),
    (0.0, 1.0, -0.5, 0.5, "mu=0, sigma=1, [-0.5,0.5]"),
    (0.0, 1.0, 2.0, np.inf, "mu=0, sigma=1, [2,∞)"),
]

fig1 = plot_pdf_family(param_sets_1, "Truncated normal PDFs (varying bounds)")
fig1.show()

param_sets_2 = [
    (-1.0, 1.0, -1.0, 1.0, "mu=-1, sigma=1, [-1,1]"),
    (0.0, 1.0, -1.0, 1.0, "mu=0, sigma=1, [-1,1]"),
    (1.0, 1.0, -1.0, 1.0, "mu=1, sigma=1, [-1,1]"),
]

fig2 = plot_pdf_family(param_sets_2, "PDFs with fixed bounds but different mu")
fig2.show()

6) Derivations#

Below we derive the mean/variance for the standardized truncated normal. Let

\[ T \sim \mathcal{N}(0,1)\mid (\alpha\le T\le \beta), \qquad f_T(t)=\frac{\varphi(t)}{Z}\;\mathbf{1}_{[\alpha,\beta]}(t), \qquad Z=\Phi(\beta)-\Phi(\alpha). \]

Expectation#

(29)#\[\begin{align} \mathbb{E}[T] &= \frac{1}{Z}\int_{\alpha}^{\beta} t\,\varphi(t)\,dt. \end{align}\]

Use the key identity \(\varphi'(t)=-t\varphi(t)\), so \(t\varphi(t)=-\varphi'(t)\):

(30)#\[\begin{align} \mathbb{E}[T] &= \frac{1}{Z}\int_{\alpha}^{\beta} -\varphi'(t)\,dt = \frac{1}{Z}\bigl[ -\varphi(t)\bigr]_{\alpha}^{\beta} = \frac{\varphi(\alpha)-\varphi(\beta)}{Z}. \end{align}\]

Then \(X=\mu+\sigma T\) gives \(\mathbb{E}[X]=\mu+\sigma\mathbb{E}[T]\).

Variance#

First compute \(\mathbb{E}[T^2]\):

(31)#\[\begin{align} \mathbb{E}[T^2] &= \frac{1}{Z}\int_{\alpha}^{\beta} t^2\varphi(t)\,dt. \end{align}\]

Use integration by parts with \(u=t\) and \(dv=t\varphi(t)dt=-\varphi'(t)dt\):

(32)#\[\begin{align} \int t^2\varphi(t)\,dt &= \int t\,(t\varphi(t))\,dt = \int t\,(-\varphi'(t))\,dt = -t\varphi(t) + \int \varphi(t)\,dt = -t\varphi(t) + \Phi(t). \end{align}\]

So

(33)#\[\begin{align} \mathbb{E}[T^2] &= \frac{1}{Z}\Bigl(\bigl[-t\varphi(t)+\Phi(t)\bigr]_{\alpha}^{\beta}\Bigr) = \frac{1}{Z}\Bigl(Z + \alpha\varphi(\alpha)-\beta\varphi(\beta)\Bigr) = 1 + \frac{\alpha\varphi(\alpha)-\beta\varphi(\beta)}{Z}. \end{align}\]

Finally

\[ \mathrm{Var}(T)=\mathbb{E}[T^2]-\mathbb{E}[T]^2, \qquad \mathrm{Var}(X)=\sigma^2\,\mathrm{Var}(T). \]

Likelihood#

For observations \(x_1,\dots,x_n\in[\ell,u]\):

\[ \log f(x_i\mid\mu,\sigma,\ell,u) = -\frac12\left(\frac{x_i-\mu}{\sigma}\right)^2 - \log\sigma - \log Z - \frac12\log(2\pi). \]

The log-likelihood is the sum over \(i\). Because \(Z=\Phi(\beta)-\Phi(\alpha)\) depends on \((\mu,\sigma)\), MLE typically requires numerical optimization.

def fit_truncnorm_mle(x, lower, upper, mu0=None, sigma0=None):
    """MLE for (mu, sigma) given known truncation bounds [lower, upper]."""
    x = np.asarray(x, dtype=float)
    lower = float(lower)
    upper = float(upper)

    if not np.all((x >= lower) & (x <= upper)):
        raise ValueError("all observations must lie within [lower, upper]")

    if mu0 is None:
        mu0 = float(np.mean(x))
    if sigma0 is None:
        sigma0 = float(np.std(x, ddof=1))
    sigma0 = max(sigma0, 1e-6)

    def nll(theta):
        mu = float(theta[0])
        sigma = float(np.exp(theta[1]))
        ll = truncnorm_loglik(x, mu, sigma, lower, upper)
        if not np.isfinite(ll):
            return 1e300
        return -ll

    res = optimize.minimize(
        nll,
        x0=np.array([mu0, np.log(sigma0)]),
        method='L-BFGS-B',
    )
    mu_hat = float(res.x[0])
    sigma_hat = float(np.exp(res.x[1]))
    return mu_hat, sigma_hat, res


# Demo: recover parameters from synthetic data
mu_true, sigma_true, lower, upper = 0.5, 1.2, -1.0, 2.0

a = (lower - mu_true) / sigma_true
b = (upper - mu_true) / sigma_true

x = stats.truncnorm(a, b, loc=mu_true, scale=sigma_true).rvs(size=3000, random_state=rng)

mu_hat, sigma_hat, res = fit_truncnorm_mle(x, lower, upper)
(mu_true, sigma_true), (mu_hat, sigma_hat), res.success
((0.5, 1.2), (0.4995503501371839, 1.229687469851479), True)

7) Sampling & simulation (NumPy-only)#

A conceptually simple sampler is rejection sampling from the underlying normal.

Algorithm (accept–reject)#

  1. Propose \(Y\sim\mathcal{N}(\mu,\sigma^2)\).

  2. If \(Y\in[\ell,u]\), accept and output \(X=Y\). Otherwise reject and try again.

This works because the truncated normal is exactly the conditional distribution \(Y\mid(\ell\le Y\le u)\).

Efficiency#

The acceptance probability is

\[\mathbb{P}(\ell\le Y\le u)=Z=\Phi(\beta)-\Phi(\alpha).\]

So the expected number of proposals per accepted sample is \(1/Z\). If \(Z\) is tiny (very narrow interval or far out in the tails), naive rejection can be slow.

def truncnorm_rvs_rejection_numpy(mu, sigma, lower, upper, size=1, rng=None, batch_multiplier=4):
    """Sample from TruncNorm(mu, sigma^2; [lower, upper]) using NumPy-only rejection sampling."""
    if rng is None:
        rng = np.random.default_rng()

    mu = float(mu)
    sigma = float(sigma)
    lower = float(lower)
    upper = float(upper)

    if sigma <= 0:
        raise ValueError("sigma must be > 0")
    if not (lower < upper):
        raise ValueError("require lower < upper")

    size_tuple = (size,) if isinstance(size, int) else tuple(size)
    n = int(np.prod(size_tuple))

    if np.isneginf(lower) and np.isposinf(upper):
        return rng.normal(loc=mu, scale=sigma, size=size_tuple), n

    out = np.empty(n, dtype=float)
    filled = 0
    proposed = 0

    while filled < n:
        m = n - filled
        batch = max(batch_multiplier * m, 128)

        y = rng.normal(loc=mu, scale=sigma, size=batch)
        proposed += y.size

        acc = y[(y >= lower) & (y <= upper)]
        k = min(acc.size, m)
        if k > 0:
            out[filled : filled + k] = acc[:k]
            filled += k

    return out.reshape(size_tuple), proposed


s, proposed = truncnorm_rvs_rejection_numpy(0.5, 1.2, -1.0, 2.0, size=50_000, rng=rng)

s.min(), s.max(), s.mean(), proposed / s.size
(-0.9997033556964805, 1.999994264038702, 0.49656530813860444, 4.0)

8) Visualization#

We’ll visualize:

  • the PDF and CDF

  • Monte Carlo samples (histogram + empirical CDF)

We’ll use the formulas from Sections 3–4 and the NumPy-only sampler from Section 7.

mu, sigma, lower, upper = 0.5, 1.2, -1.0, 2.0

x_grid = np.linspace(lower, upper, 600)

pdf = truncnorm_pdf(x_grid, mu, sigma, lower, upper)
cdf = truncnorm_cdf(x_grid, mu, sigma, lower, upper)

fig_pdf = px.line(x=x_grid, y=pdf, title="Truncated normal PDF", labels={"x": "x", "y": "pdf"})
fig_pdf.show()

fig_cdf = px.line(x=x_grid, y=cdf, title="Truncated normal CDF", labels={"x": "x", "y": "cdf"})
fig_cdf.show()

# Monte Carlo samples
samples, _ = truncnorm_rvs_rejection_numpy(mu, sigma, lower, upper, size=20_000, rng=rng)

fig_hist = px.histogram(
    x=samples,
    nbins=60,
    histnorm='probability density',
    title="Monte Carlo samples (histogram) with PDF overlay",
    labels={"x": "x"},
)
fig_hist.add_trace(go.Scatter(x=x_grid, y=pdf, mode='lines', name='theory pdf'))
fig_hist.show()

# Empirical CDF vs theoretical CDF
xs = np.sort(samples)
ecdf = np.arange(1, xs.size + 1) / xs.size

fig_ecdf = go.Figure()
fig_ecdf.add_trace(go.Scatter(x=xs, y=ecdf, mode='lines', name='empirical CDF'))
fig_ecdf.add_trace(go.Scatter(x=x_grid, y=cdf, mode='lines', name='theory CDF'))
fig_ecdf.update_layout(title="Empirical CDF vs theoretical CDF", xaxis_title='x', yaxis_title='cdf')
fig_ecdf.show()

9) SciPy integration (scipy.stats.truncnorm)#

SciPy’s truncated normal is scipy.stats.truncnorm.

Key points:

  • a and b are standardized bounds (in units of standard deviations).

  • loc and scale are the underlying normal’s \((\mu,\sigma)\).

Mapping from absolute bounds \([\ell,u]\):

\[ a = \frac{\ell-\mu}{\sigma},\qquad b = \frac{u-\mu}{\sigma}. \]

Then:

rv = stats.truncnorm(a, b, loc=mu, scale=sigma)

If you call fit, SciPy estimates (a, b, loc, scale) directly. If your truncation bounds are known and fixed in absolute units, you usually want a custom likelihood fit (Section 6), because (a,b) depend on \((\mu,\sigma)\) through the mapping above.

mu, sigma, lower, upper = 0.5, 1.2, -1.0, 2.0

a = (lower - mu) / sigma
b = (upper - mu) / sigma

rv = stats.truncnorm(a, b, loc=mu, scale=sigma)

# pdf/cdf/rvs
x_grid = np.linspace(lower, upper, 5)
rv.pdf(x_grid), rv.cdf(x_grid), rv.rvs(size=3, random_state=rng)
(array([0.193 , 0.3467, 0.4215, 0.3467, 0.193 ]),
 array([0.    , 0.2033, 0.5   , 0.7967, 1.    ]),
 array([1.424 , 1.7351, 0.2584]))
# Compare SciPy moments to our formulas

m_scipy, v_scipy, skew_scipy, kurt_scipy = rv.stats(moments='mvsk')

mom = truncnorm_moments(mu, sigma, lower, upper)

{
    'mean_scipy': float(m_scipy),
    'mean_formula': mom['mean'],
    'var_scipy': float(v_scipy),
    'var_formula': mom['variance'],
    'skew_scipy': float(skew_scipy),
    'skew_formula': mom['skewness'],
    'excess_kurt_scipy': float(kurt_scipy),
    'excess_kurt_formula': mom['excess_kurtosis'],
}
{'mean_scipy': 0.5,
 'mean_formula': 0.5,
 'var_scipy': 0.6063036262023138,
 'var_formula': 0.6063036262023139,
 'skew_scipy': 0.0,
 'skew_formula': 0.0,
 'excess_kurt_scipy': -0.9776753030120942,
 'excess_kurt_formula': -0.9776753030120942}
# Fit example (SciPy parameterization)

samples = rv.rvs(size=5000, random_state=rng)

a_hat, b_hat, loc_hat, scale_hat = stats.truncnorm.fit(samples)

{
    'true': (a, b, mu, sigma),
    'fit': (a_hat, b_hat, loc_hat, scale_hat),
}
{'true': (-1.25, 1.25, 0.5, 1.2),
 'fit': (-1.1095450525138721,
  0.9161971056921887,
  0.6441090041345641,
  1.4780312277747587)}

10) Statistical use cases#

A) Hypothesis testing#

When data are truncated, using “ordinary” normal-model tests can be badly biased. A principled approach is to write down the truncated likelihood and use standard likelihood tools:

  • Likelihood ratio tests (LRT)

  • Wald tests (via the observed Fisher information)

  • Parametric bootstrap (often more accurate in small samples)

B) Bayesian modeling#

A truncated normal is a common way to build a normal prior with hard constraints. With a normal likelihood, the untruncated posterior is normal; adding bounds yields a truncated normal posterior.

C) Generative modeling#

Truncated normals are used whenever you want “Gaussian-like” randomness but want to avoid extreme outliers. A famous example is truncated normal weight initialization in deep learning, where weights are drawn from a normal distribution but clipped by truncation (e.g. within \(\pm2\sigma\)).

# A) Likelihood ratio test for H0: mu = mu0 (bounds known)

lower, upper = -1.0, 2.0
mu_true, sigma_true = 0.6, 1.1
mu0 = 0.0

a = (lower - mu_true) / sigma_true
b = (upper - mu_true) / sigma_true

data = stats.truncnorm(a, b, loc=mu_true, scale=sigma_true).rvs(size=2500, random_state=rng)

# Unrestricted MLE
mu_hat, sigma_hat, _ = fit_truncnorm_mle(data, lower, upper)
ll_alt = truncnorm_loglik(data, mu_hat, sigma_hat, lower, upper)

# Null: mu fixed, optimize sigma only

def fit_sigma_given_mu(x, mu_fixed, lower, upper, sigma0=None):
    x = np.asarray(x, dtype=float)
    if sigma0 is None:
        sigma0 = float(np.std(x, ddof=1))
    sigma0 = max(sigma0, 1e-6)

    def nll(log_sigma):
        sigma = float(np.exp(log_sigma))
        ll = truncnorm_loglik(x, mu_fixed, sigma, lower, upper)
        if not np.isfinite(ll):
            return 1e300
        return -ll

    res = optimize.minimize_scalar(nll, bracket=(np.log(sigma0) - 1.0, np.log(sigma0) + 1.0))
    return float(np.exp(res.x)), res

sigma0_hat, _ = fit_sigma_given_mu(data, mu0, lower, upper)
ll_null = truncnorm_loglik(data, mu0, sigma0_hat, lower, upper)

lr_stat = 2 * (ll_alt - ll_null)
p_value = stats.chi2.sf(lr_stat, df=1)

{
    'mu_hat': mu_hat,
    'sigma_hat': sigma_hat,
    'sigma_hat_under_H0': sigma0_hat,
    'lr_stat': lr_stat,
    'p_value_chi2_approx': p_value,
}
{'mu_hat': 0.5615952358558073,
 'sigma_hat': 1.023327322118709,
 'sigma_hat_under_H0': 1.862372609449451,
 'lr_stat': 166.35630655358545,
 'p_value_chi2_approx': 4.6241821384229533e-38}
# B) Bayesian modeling: truncated-normal prior + normal likelihood

# Parameter theta is constrained to [0, 1]
lower, upper = 0.0, 1.0

# Prior: theta ~ TruncNorm(mu0, sigma0^2; [0,1])
mu0, sigma0 = 0.5, 0.25

# Data: y_i | theta ~ Normal(theta, sigma_y^2)
sigma_y = 0.10

theta_true = 0.7
n = 30
y = rng.normal(loc=theta_true, scale=sigma_y, size=n)

# Untruncated Normal-Normal posterior parameters
post_var = 1.0 / (1.0 / sigma0**2 + n / sigma_y**2)
post_mu = post_var * (mu0 / sigma0**2 + n * y.mean() / sigma_y**2)
post_sigma = float(np.sqrt(post_var))

# With bounds, posterior is truncated normal with the same (post_mu, post_sigma) but restricted to [0,1]
x_grid = np.linspace(lower, upper, 600)
prior_pdf = truncnorm_pdf(x_grid, mu0, sigma0, lower, upper)
post_pdf = truncnorm_pdf(x_grid, post_mu, post_sigma, lower, upper)

fig = go.Figure()
fig.add_trace(go.Scatter(x=x_grid, y=prior_pdf, mode='lines', name='prior'))
fig.add_trace(go.Scatter(x=x_grid, y=post_pdf, mode='lines', name='posterior'))
fig.update_layout(title='Prior vs posterior (both truncated normal)', xaxis_title='theta', yaxis_title='density')
fig.show()

{
    'y_mean': float(y.mean()),
    'post_mu_untruncated': post_mu,
    'post_sigma_untruncated': post_sigma,
    'posterior_mean_truncated': truncnorm_moments(post_mu, post_sigma, lower, upper)['mean'],
}
{'y_mean': 0.7145105676253577,
 'post_mu_untruncated': 0.7133725805292019,
 'post_sigma_untruncated': 0.018208926018230744,
 'posterior_mean_truncated': 0.7133725805292019}
# C) Generative modeling: truncated normal for "Gaussian-like" randomness without extreme outliers

sigma = 1.0
n = 200_000

w_normal = rng.normal(loc=0.0, scale=sigma, size=n)
w_trunc = stats.truncnorm(-2.0, 2.0, loc=0.0, scale=sigma).rvs(size=n, random_state=rng)

{
    'max_abs_normal': float(np.max(np.abs(w_normal))),
    'max_abs_trunc': float(np.max(np.abs(w_trunc))),
}
{'max_abs_normal': 4.673029245895192, 'max_abs_trunc': 1.9999682329410746}
# Compare tails visually

x_grid = np.linspace(-4, 4, 800)

pdf_normal = stats.norm(loc=0, scale=1).pdf(x_grid)
pdf_trunc = stats.truncnorm(-2.0, 2.0, loc=0.0, scale=1.0).pdf(x_grid)

fig = go.Figure()
fig.add_trace(go.Scatter(x=x_grid, y=pdf_normal, mode='lines', name='Normal(0,1)'))
fig.add_trace(go.Scatter(x=x_grid, y=pdf_trunc, mode='lines', name='TruncNorm(0,1;[-2,2])'))
fig.update_layout(title='Tail behavior: normal vs truncated normal', xaxis_title='x', yaxis_title='pdf')
fig.show()

11) Pitfalls#

  • Truncation vs censoring: truncation removes out-of-range values; censoring records them at the boundary.

  • SciPy parameter confusion: a and b are standardized; the actual support is (loc + a*scale, loc + b*scale).

  • Numerical issues: when \(Z=\Phi(\beta)-\Phi(\alpha)\) is tiny, naive subtraction can underflow to 0. Use log-CDF computations (or SciPy’s implementation).

  • Slow rejection sampling: when \(Z\) is tiny, accept–reject becomes inefficient; consider inverse-CDF sampling or specialized tail samplers.

  • Fitting with known bounds: SciPy’s fit estimates (a,b,loc,scale); if you have fixed absolute bounds, use a custom likelihood (Section 6).

# SciPy parameterization pitfall: (a, b) are in *standard deviations*, not absolute units

mu, sigma = 0.5, 0.2
lower, upper = 0.0, 1.0

rv_wrong = stats.truncnorm(0.0, 1.0, loc=mu, scale=sigma)
rv_right = stats.truncnorm((lower - mu) / sigma, (upper - mu) / sigma, loc=mu, scale=sigma)

{
    'wrong_support': rv_wrong.support(),
    'right_support': rv_right.support(),
}
{'wrong_support': (0.5, 0.7), 'right_support': (0.0, 1.0)}

12) Summary#

  • truncnorm is a normal distribution conditioned to lie in \([\ell,u]\).

  • The PDF is the normal PDF divided by the probability mass \(Z=\Phi(\beta)-\Phi(\alpha)\) in the interval.

  • Truncation shifts the mean toward the interval and can induce strong skewness.

  • Likelihood-based inference should use the truncated likelihood; naive normal-model procedures can be biased.

  • A simple NumPy-only sampler is accept–reject from \(\mathcal{N}(\mu,\sigma^2)\).

  • In SciPy, a and b are standardized bounds: a=(lower-mu)/sigma, b=(upper-mu)/sigma.